AITopics | subjective task

Collaborating Authors

subjective task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diversity-Enhanced Reasoning for Subjective Questions

Wang, Yumeng, Fan, Zhiyuan, Liu, Jiayu, Huang, Jen-tse, Fung, Yi R.

arXiv.org Artificial IntelligenceOct-3-2025

Large Reasoning Models (LRMs) with long chain-of-thought capabilities, optimized via reinforcement learning with verifiable rewards (RLVR), excel at objective reasoning tasks like mathematical problem solving and code generation. However, RLVR is known for degrading generation diversity, which causes LRMs to fall short on subjective reasoning that has multiple answers depending on different role perspectives. While recent studies recognize the importance of diversity-enhanced training in objective reasoning, limited attention has been given to subjective tasks. In this paper, we find that subjective reasoning can be improved by introducing perspective diversity and token-level diversity, with the former one providing a coherent scaffolding anchored to a real-world stakeholder group and the latter one broadening the answer search space. We propose MultiRole-R1, a diversity-enhanced training framework featuring an unsupervised data construction pipeline that synthesizes reasoning chains incorporating various role perspectives. It also employs reinforcement learning via Group Relative Policy Optimization with reward shaping, taking diversity as a reward signal in addition to verifiable reward. Training on subjective tasks solely, MultiRole-R1 increases the in-domain and out-of-domain accuracy by 14.1% and 7.64%, and even enhances the performance on advanced math reasoning such as AIME 2024. We further show that diversity is a more consistent indicator of accuracy than reasoning length.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.20187

Country:

Europe (1.00)
Asia (0.92)
North America > United States > California (0.45)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (0.47)
Health & Medicine > Health Care Providers & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
(2 more...)

Add feedback

Exploring Subjective Tasks in Farsi: A Survey Analysis and Evaluation of Language Models

Rooein, Donya, Plaza-del-Arco, Flor Miriam, Nozza, Debora, Hovy, Dirk

arXiv.org Artificial IntelligenceSep-9-2025

Given Farsi's speaker base of over 127 million people and the growing availability of digital text, including more than 1.3 million articles on Wikipedia, it is considered a middle-resource language. However, this label quickly crumbles when the situation is examined more closely. We focus on three subjective tasks (Sentiment Analysis, Emotion Analysis, and Toxicity Detection) and find significant challenges in data availability and quality, despite the overall increase in data availability. We review 110 publications on subjective tasks in Farsi and observe a lack of publicly available datasets. Furthermore, existing datasets often lack essential demographic factors, such as age and gender, that are crucial for accurately modeling subjectivity in language. When evaluating prediction models using the few available datasets, the results are highly unstable across both datasets and models. Our findings indicate that the volume of data is insufficient to significantly improve a language's prospects in NLP.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.05719

Country:

Asia (0.93)
North America > Canada (0.46)
North America > Mexico (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

ClimateChat: Designing Data and Methods for Instruction Tuning LLMs to Answer Climate Change Queries

Chen, Zhou, Wang, Xiao, Liao, Yuanhong, Lin, Ming, Bai, Yuqi

arXiv.org Artificial IntelligenceJun-18-2025

As the issue of global climate change becomes increasingly severe, the demand for research in climate science continues to grow. Natural language processing technologies, represented by Large Language Models (LLMs), have been widely applied to climate change-specific research, providing essential information support for decision-makers and the public. Some studies have improved model performance on relevant tasks by constructing climate change-related instruction data and instruction-tuning LLMs. However, current research remains inadequate in efficiently producing large volumes of high-precision instruction data for climate change, which limits further development of climate change LLMs. This study introduces an automated method for constructing instruction data. The method generates instructions using facts and background knowledge from documents and enhances the diversity of the instruction data through web scraping and the collection of seed instructions. Using this method, we constructed a climate change instruction dataset, named ClimateChat-Corpus, which was used to fine-tune open-source LLMs, resulting in an LLM named ClimateChat. Evaluation results show that ClimateChat significantly improves performance on climate change question-and-answer tasks. Additionally, we evaluated the impact of different base models and instruction data on LLM performance and demonstrated its capability to adapt to a wide range of climate change scientific discovery tasks, emphasizing the importance of selecting an appropriate base model for instruction tuning. This research provides valuable references and empirical support for constructing climate change instruction data and training climate change-specific LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.13796

Country:

Asia > China (0.04)
North America > United States (0.04)
North America > Greenland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Bridging the Gap: In-Context Learning for Modeling Human Disagreement

Muscato, Benedetta, Li, Yue, Gezici, Gizem, Zhao, Zhixue, Giannotti, Fosca

arXiv.org Artificial IntelligenceJun-9-2025

Large Language Models (LLMs) have shown strong performance on NLP classification tasks. However, they typically rely on aggregated labels-often via majority voting-which can obscure the human disagreement inherent in subjective annotations. This study examines whether LLMs can capture multiple perspectives and reflect annotator disagreement in subjective tasks such as hate speech and offensive language detection. We use in-context learning (ICL) in zero-shot and few-shot settings, evaluating four open-source LLMs across three label modeling strategies: aggregated hard labels, and disaggregated hard and soft labels. In few-shot prompting, we assess demonstration selection methods based on textual similarity (BM25, PLM-based), annotation disagreement (entropy), a combined ranking, and example ordering strategies (random vs. curriculum-based). Results show that multi-perspective generation is viable in zero-shot settings, while few-shot setups often fail to capture the full spectrum of human judgments. Prompt design and demonstration selection notably affect performance, though example ordering has limited impact. These findings highlight the challenges of modeling subjectivity with LLMs and the importance of building more perspective-aware, socially intelligent models.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.06113

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (0.69)
Law Enforcement & Public Safety > Terrorism (0.46)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Maximizing Signal in Human-Model Preference Alignment

Kraus, Kelsey, Kroll, Margaret

arXiv.org Artificial IntelligenceMar-6-2025

The emergence of powerful LLMs has led to a paradigm shift in Natural Language Understanding and Natural Language Generation. The properties that make LLMs so valuable for these tasks -- creativity, ability to produce fluent speech, and ability to quickly and effectively abstract information from large corpora -- also present new challenges to evaluating their outputs. The rush to market has led teams to fall back on quick, cost-effective automatic evaluations which offer value, but do not obviate the need for human judgments in model training and evaluation. This paper argues that in cases in which end users need to agree with the decisions made by ML models -- e.g. in toxicity detection or extraction of main points for summarization -- models should be trained and evaluated on data that represent the preferences of those users. We support this argument by explicating the role of human feedback in labeling and judgment tasks for model training and evaluation. First, we propose methods for disentangling noise from signal in labeling tasks. Then we show that noise in labeling disagreement can be minimized by adhering to proven methodological best practices, while signal can be maximized to play an integral role in model training and evaluation tasks. Finally, we illustrate best practices by providing a case study in which two guardrails classifiers are evaluated using human judgments to align final model behavior to user preferences. We aim for this paper to provide researchers and professionals with guidelines to integrating human judgments into their ML and generative AI evaluation toolkit, particularly when working toward achieving accurate and unbiased features that align with users' needs and expectations.

annotator, disagreement, participant, (14 more...)

arXiv.org Artificial Intelligence

2503.0491

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Perspective Transition of Large Language Models for Solving Subjective Tasks

Wang, Xiaolong, Zhang, Yuanchi, Wang, Ziyue, Xu, Yuzhuang, Luo, Fuwen, Wang, Yile, Li, Peng, Liu, Yang

arXiv.org Artificial IntelligenceJan-15-2025

Large language models (LLMs) have revolutionized the field of natural language processing, enabling remarkable progress in various tasks. Different from objective tasks such as commonsense reasoning and arithmetic question-answering, the performance of LLMs on subjective tasks is still limited, where the perspective on the specific problem plays crucial roles for better interpreting the context and giving proper response. For example, in certain scenarios, LLMs may perform better when answering from an expert role perspective, potentially eliciting their relevant domain knowledge. In contrast, in some scenarios, LLMs may provide more accurate responses when answering from a third-person standpoint, enabling a more comprehensive understanding of the problem and potentially mitigating inherent biases. In this paper, we propose Reasoning through Perspective Transition (RPT), a method based on in-context learning that enables LLMs to dynamically select among direct, role, and third-person perspectives for the best way to solve corresponding subjective problem. Through extensive experiments on totally 12 subjective tasks by using both closed-source and open-source LLMs including GPT-4, GPT-3.5, Llama-3, and Qwen-2, our method outperforms widely used single fixed perspective based methods such as chain-of-thought prompting and expert prompting, highlights the intricate ways that LLMs can adapt their perspectives to provide nuanced and contextually appropriate responses for different problems.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.09265

Country:

Asia > China (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?

Khurana, Urja, Nalisnick, Eric, Fokkens, Antske, Swayamdipta, Swabha

arXiv.org Artificial IntelligenceAug-26-2024

Subjective tasks in NLP have been mostly relegated to objective standards, where the gold label is decided by taking the majority vote. This obfuscates annotator disagreement and the inherent uncertainty of the label. We argue that subjectivity should factor into model decisions and play a direct role via calibration under a selective prediction setting. Specifically, instead of calibrating confidence purely from the model's perspective, we calibrate models for subjective tasks based on crowd worker agreement. Our method, Crowd-Calibrator, models the distance between the distribution of crowd worker labels and the model's own distribution over labels to inform whether the model should abstain from a decision. On two highly subjective tasks, hate speech detection and natural language inference, our experiments show Crowd-Calibrator either outperforms or achieves competitive performance with existing selective prediction baselines. Our findings highlight the value of bringing human decision-making into model predictions.

aclanthology, computational linguistic, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2408.14141

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(15 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding

Ananthram, Amith, Stengel-Eskin, Elias, Vondrick, Carl, Bansal, Mohit, McKeown, Kathleen

arXiv.org Artificial IntelligenceJun-17-2024

Vision-language models (VLMs) can respond to queries about images in many languages. However, beyond language, culture affects how we see things. For example, individuals from Western cultures focus more on the central figure in an image while individuals from Eastern cultures attend more to scene context. In this work, we present a novel investigation that demonstrates and localizes VLMs' Western bias in image understanding. We evaluate large VLMs across subjective and objective visual tasks with culturally diverse images and annotations. We find that VLMs perform better on the Western subset than the Eastern subset of each task. Controlled experimentation tracing the source of this bias highlights the importance of a diverse language mix in text-only pre-training for building equitable VLMs, even when inference is performed in English. Moreover, while prompting in the language of a target culture can lead to reductions in bias, it is not a substitute for building AI more representative of the world's languages.

reduction, vlm, western bias, (11 more...)

arXiv.org Artificial Intelligence

2406.11665

Country:

North America > United States (0.93)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China (0.04)
Europe > Western Europe (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.63)

Add feedback

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models

Wang, Xiaolong, Wang, Yile, Zhang, Yuanchi, Luo, Fuwen, Li, Peng, Sun, Maosong, Liu, Yang

arXiv.org Artificial IntelligenceFeb-27-2024

Large Language Models (LLMs) have achieved remarkable performance in objective tasks such as open-domain question answering and mathematical reasoning, which can often be solved through recalling learned factual knowledge or chain-of-thought style reasoning. However, we find that the performance of LLMs in subjective tasks is still unsatisfactory, such as metaphor recognition, dark humor detection, etc. Compared to objective tasks, subjective tasks focus more on interpretation or emotional response rather than a universally accepted reasoning pathway. Based on the characteristics of the tasks and the strong dialogue-generation capabilities of LLMs, we propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation. The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales, thereby offering potential useful knowledge behind dialogues for giving the final answers. We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks. Experimental results show that RiC can yield significant improvement compared with various baselines.

dialogue, reasoning, subjective task, (15 more...)

arXiv.org Artificial Intelligence

2402.17226

Country:

North America > Mexico (0.04)
Asia > Singapore (0.04)
Asia > India (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Cost-Efficient Subjective Task Annotation and Modeling through Few-Shot Annotator Adaptation

Golazizian, Preni, Omrani, Ali, Ziabari, Alireza S., Dehghani, Morteza

arXiv.org Artificial IntelligenceFeb-21-2024

In subjective NLP tasks, where a single ground truth does not exist, the inclusion of diverse annotators becomes crucial as their unique perspectives significantly influence the annotations. In realistic scenarios, the annotation budget often becomes the main determinant of the number of perspectives (i.e., annotators) included in the data and subsequent modeling. We introduce a novel framework for annotation collection and modeling in subjective tasks that aims to minimize the annotation budget while maximizing the predictive performance for each annotator. Our framework has a two-stage design: first, we rely on a small set of annotators to build a multitask model, and second, we augment the model for a new perspective by strategically annotating a few samples per annotator. To test our framework at scale, we introduce and release a unique dataset, Moral Foundations Subjective Corpus, of 2000 Reddit posts annotated by 24 annotators for moral sentiment. We demonstrate that our framework surpasses the previous SOTA in capturing the annotators' individual perspectives with as little as 25% of the original annotation budget on two datasets. Furthermore, our framework results in more equitable models, reducing the performance disparity among annotators.

annotator, budget, dataset, (16 more...)

arXiv.org Artificial Intelligence

2402.14101

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.04)
Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.49)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback